Deep Learning Course - Assignment 1

Dog Breed Identification

Submitted by: Itay Bouganim, Ido Rom and Shauli Genish

Problem Statement

We are provided with a training set and a test set of images of dogs. Each image has a filename that is its unique id. The dataset comprises 120 breeds of dogs. The goal is to create a classifier capable of determining a dog's breed from a photo. The list of breeds is as follows

Task Description

In this Task, we were provided a strictly canine subset of ImageNet in order to practice fine-grained image categorization. How well we can tell our Norfolk Terriers from our Norwich Terriers? With 120 breeds of dogs and a limited number training images per class, we might find the problem more, err, ruff than we anticipated.

Kaggle competition link - containing the dataset and labels

Check for existing physical GPU

1. Preprocessing The Data

1.a. Check our dataset size for the train and test data

1.b. Read labels and assign filenames

The dataset does not contain labels for the test data.</br> We will read the attached csv file containing labels only for the train samples.

We can see that we indeed have 120 labeles in the dataset,
The training data is not labeled so in order to determine our progress we will submit our results to Kaggle and use some of the validation data that our model did not train on to help us visualize our progress

1.c. Look at the sample count for each label

Explore the sample count from each label

Most common labels

Least common labels

Plot labels distibution histogram

Uneven data problem

Since we have an uneven data for each label, we can expect that the learning process will be harder on some of the samples more than other.

We need to overcome this by weighting our labels relative to their presence in the dataset to avoid diversion.

Further data analysis

We can see that the train samples from each label varies.
We want to understand the amount and dimensions of our image data better

Plot the distribution of images class count and images widths heights

Uneven data dimensions

We can see that our images dimension varies, we will need to resize the samples before passing them to our model

1.d. Simillar Datasets Benchmarks

The following articles are strongly related to the problem that we are trying to solve here.
Although they don't not use the exact same dog breed dataset, they use a dog breed data set that has a label count simillar to ours and mostly use transfer learning techniques.
Dog Breed Identification - Stanford University
Dog Breed Identification - University of Waterloo
Dog Breed Identification - ResearchGate

The following tables and charts show a comparison of results gotten using different known CNN models to solve a simillar dog-identification problem:
We can see the metrics for train and validation data with and without augmentation.

1.e. Plot samples from our dog breed image samples

From a brief examination of the data we can see that it varies

All of those factors and the fact that this is a relatively small dataset for 120 labels, can make the training process harder and need to be taken into account.

Distinguisable samples

Hard to distinguish samples

Dog Focus differences

Showing not only dogs

Obstucted by noticable items

2. Create simple CNN model to try and solve the problem

We will use the KFold cross validation technique with K = 5

2.a. KFold

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples,
a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data.
The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data.
The k results can then be averaged to produce a single estimation.

We can see that our naive model approach did not work

The model did not manage to memorize the data, and the validation results are almost entirly random. Lets suggest a few ways that can possibly improve that.

2.b. Suggested ways to improve towards second iteration

We can see that we encouter overfitting problem since the train accuracy increases while the validation loss stops increasing at an early stage.

Simple CNN model iteration 2 - Confusion Matrix

plot confusion matrix to try and get a visual idea of how our second model classified our labeled data
Since we do not have labeled test data in this dataset, we will use one of the models trained in a fold and use the labeled validation data as the true labels.

Classification examples

Prepare kaggle submission CSV file

Kaggle multiclass loss score: 7.06920

Lower is better

2.c.Misclassification Causes

We can see that are model misclassifies mainly because of overfitting,
The improvements that will improve us the most are improvements that will fight the overfitting problem
while keeping our model training time reasonable.

Ways to further improve the model towards third iteration

Stratified KFold

In stratified k-fold cross-validation, the partitions are selected so that the mean response value is approximately equal in all the partitions.
In the case of binary classification, this means that each partition contains roughly the same proportions of the two types of class labels.

Use Stratisfied KFold to better represent the label distirbution in each fold

By doing that we will be able to see that each breed is represented in the fold relative to the total sample count from that breed in total training data

Improve our model from previous iteration to address the improvement suggestions mentioned above

2.d. Simple CNN model improvement

The goal is to address the suggestions for improvement suggested above to overcome the overfitting problem we encountered in the first iteration of the model

Overall we see that the model still overfits at a later epoch of the training but in a less drastic way than our previous iteration, when it comes to classification, our model still under-performs due to the limited samples for each label

Create Kaggle submission file

Kaggle multiclass loss score: 6.61512

Lower is better

2.e. Inference Time Data Augmantation

Use data augmentation on the training data to increase total training samples and to better generalize the input

As we can see, we chose to use preety loose augmentation settings on our dataset due to its small size.
We will use 40 deg rotation, zoom of up to 20%, height and width shifts of up to 20%, shear of up to 20% and horizontal flipping.

Plot a sample of the augmentation result on a single random sample

We can see that using data augmentation improved our overall results, possibly with more epochs of training this simple model will be able to lower the train/validation loss.

We can see by the samples that we got a small improvement in terms of classification.
The correct samples predcitions % went up and some of the wrong predictions can be tricky to seperate between.

Create Kaggle submission file

What is the cause for the validation accuracy being greater than training accuracy?

We can observe that we acheive validation accuracy >= training accuracy some of the times This may be caused due to the added model dropout layers, A possible explanation would be:

Due to disabling neurons, some of information about each sample is lost, and the subsequent layers attempt to construct the answers basing on incomplete representations.

The training loss is higher because we've probably made it artificially harder for the network to give the right answers. However, during validation all of the neurons are available, so the network has its full computational power - and thus it might perform better than in training.

Overall we see that the model does not overfit but still does not perform as we would like it to be,
After 60 epochs total it still seems to converge and validation loss decreases, possibly with more epochs we will be able to acheive better results

Kaggle multiclass loss score: 4.11512

Lower is better

3. Transfer learning using InceptionV3

In this section we are going to use transfer learning techniques to solve our problem using the pretrained known model InceptionV3

Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset.
The model is the culmination of many ideas developed by multiple researchers over the years. It is based on the original paper: "Rethinking the Inception Architecture for Computer Vision" by Szegedy, et. al.

A high-level diagram of the model is shown below:

Inception v3 TPU training runs match accuracy curves produced by GPU jobs of similar configuration. The model has been successfully trained on v2-8, v2-128, and v2-512 configurations. The model has attained greater than 78.1% accuracy in about 170 epochs on each of these. The model itself is made up of symmetric and asymmetric building blocks, including convolutions, average pooling, max pooling, concats, dropouts, and fully connected layers. Batchnorm is used extensively throughout the model and applied to activation inputs. Loss is computed via Softmax.

Inception basic block:

Comparison of Inception V3 against other known models:

We can see that InceptionV3 is in the upper spectrum when it comes to accuracy and efficiency.

You can learn more about InceptionV3 in the following article: Inception V3 Paper

3.a Add trainable output layers to the pre-trained model

3.b. We will use strarisfied train-test split

3.c. Train our InceptionV3 transfer learning model

We can see by the predictions samples that we got big imporvement by using InceptionV3.
Most of the wrong predictions that we got are hard to seperate just by looking at the pictures.
The wrong predictions dog breeds are very simillar looking in terms of shape, hair, color and size.
We can also see that the correct and wrong predictions are very high for the top samples, this can indicate that our model is very decisive this time around.

We can clearly see from the matrix above that using InceptionV3 as the base model is a big improvement to our custom CNN

Kaggle multiclass loss score: 1.85799

Lower is better

3.d. Use our Transfer Learning InceptionV3 based model as feature extractor for SVM

Using Linear Support Vector Machine algorithm

We are going to use the model trained in the previous section to extract our dataset features and apply the Linear support vector machines (SVM) ML algorithm to classify our data using those features

About Support Vector Machine (SVM)

Support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
More formally, a support-vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks like outliers detection.
A good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class (so-called functional margin), since in general the larger the margin, the lower the generalization error of the classifier.

We will use RBF(Radial basis function kernel) boundary decision

3.e. Results Summary and Model comparison

4. Summary Report

Can be found in the attached PDF file.